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Abstract 

Existing approaches to analyzing the asymptotics of graph Laplacians 
typically assume a well-behaved kernel function with smoothness assump- 
tions. We remove the smoothness assumption and generalize the analysis 
of graph Laplacians to include previously unstudied graphs including kNN 
graphs. We also introduce a kernel-free framework to analyze graph con- 
structions with shrinking neighborhoods in general and apply it to analyze 
locally linear embedding (LLE) . We also describe how for a given limiting 
Laplacian operator desirable properties such as a convergent spectrum and 
sparseness can be achieved choosing the appropriate graph construction. 



1 Introduction 

Graph Laplacians have become a core technology thro ughout machin e learn - 



ing. In particular , they have appeared in clusteri ng iKannan et al 



Nadler et al, 
(|2003^ . 



von Luxburg et al. (|2008l ). dimensionality redu ction iBelkin fc Niyogi 



(|2006l ). and semi-supervised learning lBelkin fc Nivogil (|2004l ): IZhu et al 



( 2004 ) 



2003 h 



While graph Laplacians are but one member of a broad class of methods 
that use local neighborhood graphs to model data lying on a low-dimensional 
manifold embedded in a high-dimensional space, they are distinguished by their 
appealing mathematical properties, notably: (1) the graph Laplacian is the in- 
finitesimal generator for a random walk on the graph, and (2) it is a discrete ap- 
proximation to a weighted Laplace-Beltrami operator on a manifold, an operator 
which has numerous geometric properties and induces a smoothness functional. 
These mathematical properties have served as a foundation for the development 
of a growing theoretical literature that has analyzed learning procedures based 
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on the graph Laplacian. To review briefly, iBousauet et al. ( 20031 ) proved an 
early result for the convergence of the unnormalized graph La placian to a reg- 
ulariz ation functional that depends on the squared density . iBelkin fc Nivogi 
(|2005l) demonstrated the pointwise convergence of the empirical unnormalized 
Laplacian to t he Laplace-B eltra mi operator o n a co mpact manifold with uni- 
form density. iLafonl (|2004 and iNadler et all (l2006h established a connection 
between graph Laplacians and the infinitesimal generator of a diffusion pro- 
cess. They further sho wed that one may use the degree operator to control 
the effect of the density. iHein et al. 1 (l2005h combined and generalized these re- 
sults for weak and pointwise (strong) convergence under weaker assumptions 
as well as providing rates for the unnormalized, normalized, and random walk 
Laplacians. They al so mak e expli cit the connections to the weighted Laplace- 
Beltrami operat or. Singer ( 2006) obtained improved convergence rates for a 
uniform density. iGine fc Koltchinskii ( 2005 ) established a uniform convergence 
result a nd functional centra l limit theo rem to extend the point wise convergence 
results. Ivon Luxburg et aL ( 2008 ) and Belkin fc Nivogi ( 2006 ) presented spec- 
tral convergence results for the eigenvectors of graph Laplacians in the fixed and 
shrinking bandwidth cases respectively. 

Although this burgeoning literature has provided many useful insights, sev- 
eral gaps remain between theory and practice. Most notably, in constructing 
the neighborhood graphs underlying the graph Laplacian, several choices must 
be made, including the choice of algorithm for constructing the graph, with k- 
nearest-neighbor (kNN) and kernel functions providing the main alternatives, as 
well as the choice of parameters (fc, kernel bandwidth, normalization weights). 
These choices can lead to the graph Laplacian generating fundamentally differ- 
ent random walks and approximating different weighted Laplace-Beltrami op- 
erators. The existing theory has focused on one specific choice in which graphs 
are generated with smooth kernels with shrinking bandwidths. But a variety of 
other choices are often made in practice, including kNN graphs, r- n eighb orhood 
graphs, and the "self-tuning" graphs of IZelnik-Manor fc Peronal (|2004l ). Sur- 
prisingly, few of t he existing convergence results apply to these choices (see 
Maier et al. I (l2008l) for an exception). 



This paper provides a general theoretical framework for analyzing graph 
Laplacians and operators that behave like Laplacians. Our point of view differs 
from that found in the existing literature; specifically, our point of departure 
is a stochastic process framework that utilizes the characterization of diffusion 
processes via drift and diffusion terms. This yields a general kernel-free frame- 
work for analyzing graph Laplac ians with shrinkin g neighborhoods. We use it 
to extend the pointwise results of lHein et al. I (|2007l) to cover non-smooth kernels 
and introduce location-dependent bandwidths. Applying these tools we are able 
to identify the asymptotic limit for a variety of graphs constructions including 
kNN, r-neighborhood, and "self-tuning" graphs. We are also able to provide an 
analysis for Locally Linear Embedding ( Roweis fc Saul . 2000() . 

A practical motivation for our interest in graph Laplacians based on kNN 
graphs is that these can be significantly sparser than those constructed using 
kernels, even if they have the same limit. Our framework allows us to establish 
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this limiting equivalence. On the other hand, we can also exhibit cases in which 
kNN graphs converge to a different limit than graphs constructed from kernels, 
and that this explains some cases where kNN graphs perform poorly. Moreover, 
our framework allows us to generate new algorithms: in particular, by using 
location-dependent bandwidths we obtain a class of operators that have nice 
sp ectral convergence proper ties that parallel those of the normalized Laplacian 



von Luxburg et al. I (120081), but which converge to a different class of limits. 



2 The Framework 

Our work exploits the connections among diffusion processes, elliptic operators 
(in particular the weighted Laplace-Beltrami operator), and stochastic differ- 
ential equati o ns (S DEs). This builds upon the diffusion process viewpoint in 
iNadler et all (|2006f ). Critically, we make the connection to the drift and dif- 



fusion terms of a diffusion process. This allows us to present a kernel-free 
framework for analysis of graph Laplacians as well as giving a better intuitive 
understanding of the limit diffusion process. 

We first give a brief overview of these connections and present our general 
framework for the asymptotic analysis of graph Laplacians as well as provid- 
ing some relevant background material. We then introduce our assumptions 
and derive our main results for the limit operator for a wide range of graph 
construction methods. We use these to calculate asymptotic limits for specific 
graph constructions. 

2.1 Relevant Differential Geometry 

Assume is a m-dimensional manifold embedded in R** . To identify the asymp- 
totic infinitesimal generator of a diffusion on this manifold, we will derive the 
drift and diffusion term s in normal coordinates at each point. We refer the 
reader to iBoothbv (Il986l) for an exact definition of normal coordinates. For our 



purposes it suffices to note that normal coordinates are coordinates in R™ that 
behave roughly as if the neighborhood was projected onto the tangent plane at 
X. The extrinsic coordinates are the coordinates R** in which the manifold is 
embedded. Since the density, and hence integration, is defined with respect to 
the manifold, we must relate to link normal coordinates s around a point x with 
the extrinsic coordinates y. This relation may be given as follows: 



y-x^ H^s + L4ss'^) + 0{\\s^\\), (1) 

where is a linear isomorphism between the normal coordinates in i?™ and 
the m-dimensional tangent plane Tj. at x. Lj. is a linear operator describing the 
curvature of the manifold and takes m x m positive semidefinite matrices into 
the space orthogonal to the tangent plane, T^. More advanced readers will note 
that this statement is Gauss' lemma and and are related to the first and 
second fundamental forms. 



3 



We are most interested in limits involving the weighted Laplace-Beltrami 
operator, a particular second-order differential operator. 



2.2 Weighted Laplace-Beltrami operator 

Definition 1 (Weighted Laplace-Beltrami operator). The weighted Laplace- 
Beltrami operator with respect to the density q is the second-order differential 
operator defined by Aq := — V where A^vi ■— div o V is the unweighted 
Laplace-Beltrami operator. 

It is of particular interest since it induces a smoothing functional for / G 
C^(A^) with support contained in the interior of the manifold: 

(/,AJ)i(,) = ||V/||i^(^). (2) 

Note that existing literature on asymptotics of graph Laplacians often refers to 
the s*^ weighted Laplace-Beltrami operator as A^ where s € JR. This is Aps in 
our notation. For mo re information on the weighted Laplace-Beltrami operator 
see Grigor'van ( 2006[ ). 



2.3 Equivalence of Limiting Characterizations 

We now establish the promised connections among elliptic operators, diffusions, 
SDEs, and graph Laplacians. We first show that elliptic operators define dif- 
fusion processes and SDEs and vice versa. An elliptic operator Q is a second 
order differential operator of the form 

gm = y: + E ^^(-)^ + ^(-)/(-)' 

ij ^ i * 

where the m x m coefficient matrix {aij{x)) is positive semidefinite for all x. If 
we use normal coordinates for a manifold, we see that the weighted Laplace- 
Beltrami operator A^ is a special case of an elliptic operator with (a^ (x)) — /, 
the identity matrix, b(x) — and c{x) — 0. Diffusion processes are related 

via a result by Dynkin which states that given a diffusion process, the generator 
of the process is an elliptic operator. 

The (infinitesimal) generator Q oi & diffusion process Xt is defined as 

gm :=limM(^il-^ 

when the limit exists and convergence is uniform over x. Here ¥.xf{Xt) = 
M{f{Xt)\XQ = x). A converse relation holds as well. The Hille-Yosida theorem 
characterizes when a linear operator, such as an elliptic operator, is the generator 
of a stochastic process. We refer the reader to iKallenberg (2002il for proofs. 

A time-homogeneous stochastic differential equation (SDK) defines a diffu- 
sion process as a solution (when one exists) to the equation 

dXt^ ^i{Xt)dt + a{Xt)dWt, 
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where Xt is a diffusion process taking values in M."^. The terms iJ,{x) and 
a{x)a{x)^ are the drift and diffusion terms of the process. 

By Dynkin's result, the generator Q of this process defines an elliptic operator 
and a simple calculation shows the operator is 



dm 

dxi 



In such diffusion processes there is no absorbing state and the term in the 

elliptic operator c{x) = 0. Wc note that one may also consider more general 
diffusion processes where c{x) < 0. When c{x) < then we have the generator 
of a diffusion process with killing where c{x) determines the killing rate of the 
diffusion at x. 

To summarize, we see that a SDE or diffusion process define an elliptic 
operator, and importantly, the coefficients are the drift and diffusion terms, and 
the reverse relationship holds: An elliptic operator defines a diffusion under 
some regularity conditions on the coefficients. 

All that remains then is to connect diffusion processes in continuous space 
to graph Laplacians on a finite set of points. Diffusion approximation theorems 
provide this connection. We state one version of such a theorem . 

Theorem 2 (Diffusion Approximation). Let ^{x) and a{x)a{x)'^ be drift and 
diffusion terms for a diffusion process defined on a compact set S C Mf', and 
let and G he the corresponding infinitesimal generator. Let {y/"^}t he Markov 
chains with transition matrices P„ on state spaces {xj}"^^ for all n, and let 
c„ > define a sequence of scalings. Put 

fin{xi) =c„E(r/"' - x.irj") = X,) 

&n{Xi)^n{Xif =Cnyax{Yl^\Y^'^^ = Xi). 



LetfGC^iS). IfforalloO 



^n{Xi)an{Xif 



l^{Xi), 

> a{xi)a{xi)'^ , 



s sup P ( 

i'Cn. ^ 



Y 



(n) 



> e 



^0 — Xi^ 



0, 



then the generators Anf = Cn{Pn — I)f Gf Furtherm,ore, for any hounded f 
and > and the continuous-time transition kernels Tn{i) = exp{tAn) and T 
the transition kernel for G, we have Tn{t)f — >■ T{t)f uniformly in t for t < to- 
Proof. We first examine the case when f{x) = x. By assumption. 



AjiTTjiX 



Cn{Pn - I)X = C„E(y/") 

lin{x) — > = Ax. 



\Yr 



(n) 



Xi) 
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Similarly if f{x) = xx'^ , ||A„7r„/ - ^ 0. If f{x) = 1, then A„7r„/ = 

7r„A/ — 0. Thus, by linearity of An, AniTnf — >■ Af for any quadratic polynomial 
/■ 

Taylor expand / to obtain f(x+h) = qx{h)-\-5x(h) where qx(h) is a quadratic 
polynomial in h. Since the second derivative is continuous and the support of 
/ is compact, sup^^^ 5x{h) — o{\\h\^) and sup^, 5x{h) < M for some constant 
M. 

Let A„ y/"' -Xj. We may bound An acting on the remainder term 6x{h) 

by 

SWpAndx = C„E((5:r(A„)|rf|"^ = x) 

X 

< supc„E(4(A„)l(||A„|| < e)|yj") = x)+ 

X 

Msupc„P(||A„|| >e|rj"^ =x) 

X 

= o(c„E(||A„f = x)) + Msupc„P(||A„|| > e|yj"^ = x) 

X 

= o(l) 

where the last equality holds by the assumptions on the uniform convergence of 
the diffusion term (JnO'n f^nd on the shrinking jumpsizes. 
Thus, AnTTnf ^ Af for any / e C^{M). 

The class of functions C^{A4) is dense in Lo^i-M) and form a core for the 
generator A. Standard theorems give equivalence between strong convergence of 
infinitesimal generators on a core and uniform str ong convergence of tran sition 
kernels on a Banach space (e.g. Theorem 1.6.1 in Ethier fc Kurt j ( 1986( )). □ 

We remark that though the results we have discussed thus far are stated in 
the context of the extrinsic coordinates M!', we describe appropriate extensions 
in terms of normal coordinates in the appendix. 



2.4 Assumptions 

We describe here the assumptions and notation for the rest of the paper. The 
following assumptions we will refer to as the standard assumptions. 

Unless stated explicitly otherwise, let / be an arbitrary function in C^(A^). 

Manifold assumptions. Assume M us a smooth m-dimensional manifold 
isometrically embedded in M'' via the map i : M. — > M^. The essential conditions 
that we require on the manifold are 

1. Smoothness, the map i is a smooth embedding. 

2. A single radius /iq such that for all x G supp{f), M. n B{x, ho) is a neigh- 
borhood of X with normal coordinates, and 

3. Bounded curvature of the manifold over supp{f), i.e. that the second 
fundamental form is bounded . 
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When the manifold is smooth and compact, then these conditions arc satisfied. 

Assume points {xi}^i are sampled i.i.d. from a density p S C'^{M.) with 
respect to the natural volume element of the manifold, and that p is bounded 
away from 0. 

Notation. For brevity, we will always use x,y G M.^ to be points on M ex- 
pressed in extrinsic coordinates and s € M"* to be normal coordinates for y in a 
neighborhood centered at x. Since they represent the same point, we will also 
use y and s interchangeably as function arguments, i.e. f{y) = f{s). Whenever 
we take a gradient, it is with respect to normal coordinates. 

Generalized kernel. Though we use a kernel free framework, our main theo- 
rem utilizes a kernel, but one that is generalizes previously studied kernels by 
1) considering non-smooth base kernels Kq, 2) introducing location dependent 
bandwidth functions rx{y), and 3) considering general weight functions Wx{y)- 

Our main result also handles 4) random weight and bandwidth functions. 
Given a bandwidth scaling parameter h > 0, define a new kernel by 



Previously analyzed constructions for smooth kernels with compact support 
are described by this more general kernel with = 1 and Wx{y) = d(a;)~^d(y)~^ 
where d{x) is the degree function and A £ M is some constant. 

The directed kNN graph is obtained if Ko{x,y) = l{\\x — y\\ < 1), r^iy) = 
distance to the k*^ nearest neighbor of x, and w^iy) = 1 for all x, y. 

We note that the kernel K is not necessarily symmetric: however, if rx{y) = 
ry{x) and Wx{y) = Wy{x) for all x,y G M then the kernel is symmetric and the 
corresponding unnormalized Laplacian is positive semi-definite. 

Kernel assumptions. We now introduce our assumptions on the choices 
Ko,h,Wx,rx that govern the graph construction. Assume that the base ker- 
nel Kq : R+ R+ has bounded variation and compact support and /i„ > 
form a sequence of bandwidth scalings. For (possible random) location de- 
pendent bandwidth and weig ht fimctions ri"^(-) > 0,w;i"^(-) > 0, assume that 
they converge to rx{-),Wx{-) respectively and the convergence is uniform over 
X G M. Further assume they have Taylor-like expansions for all x,y G A4 with 




(3) 



\\x-y\\ < h, 



'n 



rx{x) + {rx{x) + axsign{uls)ux)'^s + e^P^x, s) 
Wx{x) + Wwxixfs + e^^\x, s) 



(4) 



where the approximation error is uniformly bounded by 



sup |e(")(x,s)| =0(/i2) 



xeM,\\s\\<h., 



sup \e'^\x,s)\=Oihl) 



xeM,\\s\\<h, 
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We briefly motivate the choice of assumptions. The bounded variation con- 
dition allows for non-smooth base kernels but enough regularity to obtain limits. 
The Taylor-like expansions allow give conditions where the limit is tractable to 
analytically compute as well as allowing for randomness in the remainder term 
as long as it is of the correct order. The particular expansion for the location 
dependent bandwidth allows one to analyze undirected kNN graphs, which ex- 
hibit a non-differentiable location dependent bandwidth (see section |373|) . Note 
that we do not constrain the general weight functions w^"^ (y) to be a power of 
the degree function, dnix)" dn{y)°' nor impose a particular functional form for 
location dependent bandwidths r^. This gives us two degrees of freedom, which 
allows the same asymptotic limit be obtained for an entire class of parameters 
governing the graph construction. In section 15. 5[ we discuss one may choose a 
graph construction that has more attractive finite sample properties than other 
constructions that have the same limit. 

Functions and convergence. We define here what we mean by convergence 
when the domains of the functions are changing. When take gn 9 where 
domain{gn) = Xn C A4, to mean \\gn — T^ng\\oc ~^ where 7r„(/ = t/j^ is the 
restriction of g to X„. Likewise, for operators T„ on functions with domain Xn-, 
we take Tng — r„7r„g. Convergence of operators r„ — T means r„/ Tf 
for all / G C^{A4). When Xn = A4 for all n, this is convergence in the strong 
operator topology under the Loo norm. 

We consider the limit of the random walk Laplacian defined by as L^w = 
I — D^^W where / is the identity, W is the matrix of edge weights, and D is 
the diagonal degree matrix. 

2.5 Main Theorem 

Our main result is stated in the following theorem. 

Theorem 3. Assume the standard assumptions hold eventually with probability 
1. If the bandwidth scalings hn satisfy J, and n/i™"*"^/ logn — >■ oo, then for 
graphs constructed using the kernels 



there exists a constant ZKo,m > depending only on the base kernel Kq and the 
dimension m such that for c„ — ZKo,m/h'^ , 



where A is the infinitesimal generator of a diffusion process with the following 
drift and diffusion terms given in normal coordinates: 





CnL\!l^f^Af 



fJ-s{x) 




\ p{x) w{x) 
r^{xfl 



\7p{x) Vw{x) 



rx{x) 
Tx [x) 



) 



as{x)as{x)^ 
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where I is the m x m identity matrix. 

Proof. We apply the diffusion approximation tfieorem (Theorem [5]) to obtain 
convergence of the random walk Laplacians. Since /i„ J, 0, the probability of a 
jump of size > e equals eventually. Thus, we simply need to show uniform 
convergence of the drift and diffusion terms and identify their limits. We leave 
the detailed calculations in the appendix and present the main ideas in the proof 
here. 

We first assume that Kq is an indicator kernel. To generalize, we note that 
for kernels of bounded variation, we may write ^^0(2;) = / Id^^l < z)dr]^(z) — 
J l{\x\ < z)dr]-{z) for some finite positive measures r]-,r]+ with compact sup- 
port. The result for general kernels then follows from Fubini's theorem. 

We also initially assume that we are given the true density p. After identify- 
ing the desired limits given the true density, we show that the empirical version 
converges uniformly to the correct quantities. 

The key calculation is lemma [7] in the appendix which establishes that inte- 
grating against an indicator kernel is like integrating over a sphere re-centered 
on h^f-xix). 

Given this calculation and by Taylor expanding the non- kernel terms, one 
obtains the infinitesimal first and second moments and the degree operator. 




Vp{x) 
m + 2 



Vwx {x) 



+ Wx{x)p{x)fx{x) + 0(1) 



m + 2 




hlr^{x) 



m+2 



{W^{x)p{x)l + 0{hn)) 



771 + 2 
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^"-^^^^1^ J Kn{x,y)p{y)ds (6) 

= / (•".(^) + Oih,,)) (p(a:) + 0{K)) K„ <!» (8) 

where CKo,m — J u'^^'^drj, m — I u^^dj] and r; is the signed measure 77 = 

'7+ - 

c' 

Let Za'o.to = (m + 2) and c„ = ZKo,mlh'^- Since Kn/dn define 

Markov transition kernels, taking the limits fisix) = lim c„Mj;"''(x)/(i„(a;) and 

n— ^00 

(Ts(x)crs(a:;)^ = lim CnM^\x) / dn{x) and applying the diffusion approximation 

n— J-oo 

theorem gives the stated result. 

To more formally apply the diffusion approximation theorem we may calcu- 
late the drift and diffusion in extrinsic coordinates. In extrinsic coordinates, we 
have 

^(x) = r^{xfHA^^ + ^ + (m + 

+ r^{xfL^{I), 

a{x)a{xf =r{xflVT^, 

where IIt'^ is the projection onto the tangent plane at x, and and are the 
linear mappings between normal coordinates and extrinsic coordinates defined 
in Eqn 

We now consider the convergence of the empirical quantities. For non- 
random ri""* = Tx , wi""* = Wx , the uniform and almost sure convergence of 
the empirical quantities to the true expectation follows from an application of 
Bernstein's inequality. In particular, the value of Fn{x,S) = SiK (^ 1\~^y) ^ 
bounded by Kmaxhm where S* is y in normal coordinates and K^ax depends 
on the kernel and the maximum curvature of the manifold. Furthermore, the 
second moment calculation for Mj"'' gives that the variance Var(F„(a;, 5)) is 
bounded by c/i™+^ for some constant c that depends on K and the max of p, 
and does not depend on x. By Bernstein's inequality and a union bound, we 
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have 





\i<n 



m+2 



) 




) 



) 



(10) 



The uniform convergence a.s. of the first moment follows from Borel-Cantelli. 
Similar inequalities are attained for the empirical second moment and degree 
terms. 

Now assume ri""* , wi^'' are random and define Fn as before. To handle 
the random weight and bandwidth function case, we first choose determin- 
istic weight and bandwidth functions to maximize the first moment under a 
constraint that is satisfied eventually a.s.. Define 



for some constant k such that Tx < Tx" and Wx > Wx eventually. This is 
possible since the perturbation terms e^r^\x,s),e^\x,s) = 0{h'^). Thus, we 
have Ff^^n{x,y) > Fn{x,y) for all x,y € M eventually with probability 1. Since 
FK.,n{x, Y) uses deterministic weight and bandwidth functions, we obtain i.i.d. 
random variables and may apply the Bernstein bound on F^^nix, y) to obtain an 
upper bound on the empirical quantities, namely E„_FK,n(a;, i^) > ^nFn{x^Y) 
for all a; G eventually with probability 1. We may similarly obtain a lower 
bound. By lemma [TOj the difference between the expectation of the upper 
bound and the is EFi^^nix, F) — EFo,„(x, Y) = o{Kh^~^'^). Applying the squeeze 
theorem gives a.s. uniform convergence of the empirical first moment AI^""^ /h"^. 
The degree and second moment terms are handled similarly. 

Since p, Wx, Tx are all assumed to be bounded away from 0, the scaled degree 
operators c?„ are eventually bounded away from with probability 1, and the 

A/'"' lli^ 

continuous mapping theorem applied to — " gives a.s. uniform convergence 
of the drift and diffusion. 



2.6 Unnormalized and Normalized Laplacians 

While our results are for the infinitesimal generator of a diffusion process, that 
is, for the limit of the random walk Laplacian Lrw = I — D^^W, it is easy to 
generalize them to the unnormalized Laplacian = D — W = DLrw and sym- 
metrically normalized Laplacian Lnorm = I — D^^/^WD^^^^ = D^/'^LrwD~^^^ ■ 



r^x\y) 



Wx{y) + Khl^sign{si) 

rx{x) + {rx{x) + axsign{u^ s)UxY' s - Kh^^sign{si) 




□ 
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Corollary 4. Take the assumptions in Theorem\^ and let A be the limiting op- 
erator of the random walk Laplacian. The degree terms dn{-) converge uniformly 
a.s. to a function d{-), and 

-c[^L^)f ^d-Af a.s. 

where — Cn/Zi™. Furthermore, under the additional assumptions nft,™'^'*/ logn 
oo, swp^ y = o(/i2), swp^^y |ri"^-rj,| = o(/i^), andd,Wx,r,^ £ C'^{M), 

we have 

-c„l(:2™/ -> d^'^ ■ Aid-'/'f) a.s. 

Proof. For any two functions , (/)2 :M R, define (^2) — /i(')'/'2(-))- 

We note that .g„ is a continuous mapping in the Loo topology and 

{dn, Cr^V^f) — gu{dn, CnLrwf)- 

By the continuous mapping theorem, if dn — t- d a.s. and c,iir«;/ — > Lf a.s. in 
the then 

Thus, convergence of the random walk Laplacians implies convergence of the 
unnormalized Laplacian under the very weak condition of convergence of the 
degree operator to a bounded function. 

Convergence of the normalized Laplacian is slightly trickier. We may write 
the normalized Laplacian as 

4"oL/ = dy'4"JKT'/V) (11) 

Using the continuous mapping theorem, we see that convergence of the nor- 
malized Laplacian, CnL^normf — > d^^^^ Lrwid^^^"^ f): is equivalent to showing 
CtiLIZj {{dn^^"^ — d^^/^)f) — > 0. A Taylor expansion of the inverse square root 
gives that showing c„i[^'((i„ — d) — is sufficient to prove convergence. 

We now verify conditions which will ensure that the degree operators will 
converge at the appropriate rate. We further decompose the empirical degree 
operator into the bias E(i„ — d and empirical error dn — IEd„ . 

Simply carrying out the Taylor expansions to higher order terms in the 
calculation of the degree function dn in Eq. ^ and using the refined calculation 
of the zeroth moment in lemma|S]in the appendix, the bias of the degree operator 
is dn — d = hnb + o{hn) for some uniformly bounded, continuous function b. 

Thus we have, 

CnLi^Jidn ~d)^ Cnhl - P„)6||^ + o(l) = o(l) (13) 

since c„ft,^j is constant and ||(/ — Pn)4'\\oo ^ ^ ^-"^y continuous function (j). 

We also need to check that the empirical error \\dn — 'KdnW^o = ^i^^n) 
If n/i™"'"*/logn — ^ oo then using the Bernstein bound in equation 1101 with e 
replaced by /i^ and applying Borel-Cantelli gives the desired result. 

□ 
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2.7 Limit as weighted Laplace-Beltrami operator 

Under some regularity conditions, the limit given in the main theorem (Theorem 
[3]) yields a weighted Laplace-Beltrami operator. 

For convenience, define j{x) = r^ix), uj{x) — Wxix). 

Corollary 5. Assume the conditions of Theorem\^ and let q = p^w7"'+^. // 
Txiu) = i"y{x),'Wx{y) — Wy{x) for all x,y £ Ad and »"(.)(•): ^(O (') ^'''^ twice 
differentiable in a neighborhood of {x, x) for all x, then for = ZKa.m/h"^'^'^ 

-c;l(") ^ ^A,. (14) 

Proof. Note that V|y^^7(?/) — 2 ^\y^^rx{y). The result follows from appli- 
cation of Theorem [31 CorroUary |31 and the definition of the weighted Laplace- 
Beltrami operator. □ 



3 Application to Specific Graph Constructions 

To illustrate Theorem [31 we apply it to calculate the asymptotic limits of graph 
Laplacians for several widely used graph construction methods. We also apply 
the general diffusion theory framework to analyze LLE. 



3.1 r-Neighborhood and Kernel Graphs 



In the case of the r-neighborhood graph, the Laplacian is constructed using 
a kernel with fixed bandwidth and normalization. The base kernel is simply 
the indicator function Kq(x) = I{\x\ < r). The radius r^^y) is constant so 
f{x) — 0. The drift is given by ^s{x) — Vp{x)/p(x) and the diffusion term is 
as{x)as{x)'^ — I. The limit operator is thus 



1 



A 



M 



Vp{xf 
p{x) 



■v^Ia, 



as expected. This analysis also holds for arbitrary kernels of bounded variation. 
One may also introduce the usual weight function vji^\y) = d„{x)~°'dn{y)~°' 
to obtain limits of the form ^Ap2- 2c,) . These limits match those obtained by 



Hein et all ()2007() and iLafonl ()2004D for smooth kernels. 



3.2 Directed k-Nearest Neighbor Graph 

For kNN-graphs, the base kernel is still the indicator kernel, and the weight 
function is constant 1. However, the bandwidth function r^\y) is random and 
depends on x. Since the graph is directed, it does not depend on y so rx — 0. 

By the analysis in section [3^ rj;{x) = cp~^/^{x) for some constant c. Con- 
sequently the limit operator is proportional to 

1 / Vp^ \ 1 
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Note that this is generally not a self-adjoint operator in L(j)). The symmetriza- 
tion of the graph has a non-trivial affect to make the graph Laplacian self- 
adjoint. 



3.3 Undirected A;-Nearest Neighbor Graph 

We consider the OR-construction where the nodes Vi and Vj are linked if Vi is a k- 
nearest neighbor of Vj or vice- versa. In this case h^r'i^^ [y) — max{p„(x), pn (y)} 
where Pn{x) is the distance to the fcj/* nearest neighbor of x. The limit band with 
function is non-differentiable, rx{y) — max{p~^/"''(x),p~^/'"(y)}, but a Taylor- 
like expansion exists with fx{x) = ^ ^p[^) ■ The limit operator is 

1 . 

which is self- adjoint in L2{p)- Surprisingly, if m = 1 then the kNN graph 
construction induces a drift away from high densiy regions. 



3.4 Conditions for kNN convergence 



To complete the analysis, we must check the conditions for kNN graph construc- 
tions to satisfy the assumptions of the main theorem. This is a straightforward 
application of existing uniform consistency results for kNN density estimation. 

'k„ 



Let hn = (^) 



The condition we must verify is 



sup 



0{hl) a.s. 



We check this for the directed kNN gra ph, but analyses for other kNN graphs 
are similar. The kNN density estimate of iLoftsgaarden fc Quesenberrvl (jl965r ) 
is 



Pn{x) 



Vrr. 



n{hnr'i'\x)) 



(15) 



where hnr^'^ (x) is the distance to the fc*'* nearest neighbor of x given n data 
points. Taylor expanding equation [T51 shows that if ||p„~?'lloo ~ 0{h\) a.s. 
then the requirement on the location dependent bandwidth for the main theorem 

is satisfied. 

Devrove &: Wagneil(|l977h 's proof for the uniform consistency of kNN density 
estimation may be easily modified to show this. Take e — {kn/nY in their proof. 



One then sees that = fc„/n -> and 
to achieve the desired bound on the error. 



log n 



71+27^ 



log n 



are sufficient 
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3.5 "Self-Tuning" Graphs 

The form of the kernel used in self-tuning graphs is 

yan{x)an[y) J 

where cr„(x) — Pnix), the distance between x and the fc*'* nearest neighbor. The 
limit bandwidth function is r,j:{y) — \/p^^/"^{x)p~^^"^{y). Since this is twice 
difFerentiable, corollary [5] gives the asymptotic limit, which is the same as for 
undirected kNN graphs, 

„— 2/m A 

3.6 Locally Linear Embedding 

Locally linear embedding (LLE), introduced bv iRoweis fc Saull(l200(t. has been 



noted to behave like (the square of) the Laplace-Beltrami operator lBelkin fc Nivogi 



(|2003l) . 



Using our kernel-free framework we will show how LLE differs from weighted 
Laplace-Beltrami operators and graph Laplacians in several ways. 1) LLE has, 
in general, no well-defined asymptotic limit without additional conditions on the 
weights. 2) It can only behave like an unweighted Laplace-Beltrami operator. 
3) It is affected by the curvature of the manifold, and the curvature can cause 
LLE to not behave like any elliptic operator (including the Laplace-Beltrami 
operator). 

The key observation is that LLE only controls for the drift term in the 
extrinsic coordinates. Thus, the diffusion term has freedom to vary. However, 
if the manifold has curvature, the drift in extrinsic coordinates constrains the 
diffusion term in normal coordinates. 

The LLE matrix is defined as (/ - W)'^{I - W) where W is a weight matrix 
which minimizes reconstruction error W — argmin^, ||(/ — M^')y|| under the 
constraints W'l = 1 and W^j ^ only if j is one of the fc*'* nearest neighbors 
of i. Typically k > m, and reconstruction error ~ 0. We will analyze the matrix 
M = l-W. 

Suppose LLE produces a sequence of matrices M„ — I — Wn- The row 
sums of Mn are 0. Thus, we may decompose M„ = — where A^,A~ 
are generators for finite state Markov processes obtained from the positive and 
negative weights respectively. Assume that there is some scaling c„ such that 
c„A+, CnA~ converge to generators of diffusion processes with drifts /i_ and 
diffusion terms cr+cr!^, a-O^. Set /x = /i+ — /i_ and aa^ = cr+(T+ — cr_a_. 

No well-defined limit. We first show there is generally no well-defined 
asymptotic limit when one simply minimizes reconstruction error. Suppose 
rank[Lx) < m{m + l)/2 at x. This will necessarily be true if the extrinsic 
dimension b < m{m + l)/2 -I- m. For simplicity assume rank{Lx) — 0. Mini- 
mizing the LLE reconstruction error does not constrain the diffusion term, and 
a{x)(T{x)'^ may be chosen arbitrarily. Choose asymptotic diffusion aa^ and drift 
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/i terms that are Lipschitz so that a corresponding diffusion process necessarily 
exists. A diffusion with terms 2aa'^ and fj, will also exist in that case. 

One may easily construct graphs for the positive and negative weights with 
these asymptotic diffusion and drift terms by solving highly underdetermined 
quadratic programs. Furthermore, in the interior of the manifold, these graphs 
may be constructed so that the finite sample drift terms are exactly equal by 
adding an additional constraint. Thus, — > 2Go + fi^V and A~ Go + V 
where Go is the generator for a diffusion process with zero drift and diffusion 
term (7„(x)cr_ (x)-^. We have c„M„ = yl+ — yl^ — > Go- Thus, we can construct a 
sequence of LLE matrices that have reconstruction error but have an arbitrary 
limit. It is trivial to see how to modify the construction when < rank{Lx) < 
m{m + l)/2. 

No drift. Since /is (a;) = 0, if the LLE matrix does behave like a Laplace- 
Beltrami operator, it must behave like an unweighted one, and the density has 
no affect on the drift. 

Curvature and limit. We now show that the curvature of the manifold affects 
LLE and that the LLE matrix may not behave like any elliptic operator. If 
the manifold has sufficient curvature, namely if the extrinsic coordinates have 
dimension h > m + m(TO + l)/2 and rank{Lx) = m(TO + l)/2, then the diffusion 
term in the normal coordinates is fully constrained by the drift term in the 
extrinsic coordinates. 

Recall from equation [T] that the extrinsic coordinates as a function of the 
normal coordinates are y = x + H^s + Lx{ss^) + 0(||s||^). By linearity of 
and Lx, the asymptotic drift in the extrinsic coordinates is = Hx^s{x) + 
Lx{cJs{x)as {xY). 

Since reconstruction error in the extrinsic coordinates is 0, we have in normal 
coordinates 

/is(x) = and Lx{(Ts{x)(7s{xY') = 0. 

In other words, the asymptotic drift and diffusion terms of and A~ must be 
the same, and c„M„ — >• Go — Gq = 0. 

This implies that the scaling c„ where LLE can be expected to behave like an 
elliptic operator gives the trivial limit 0. If another scaling yields a non-trivial 
limit, it may include higher-order differential terms. It is easy to see when Lx is 
not full rank, the curvature affects LLE by partially constraining the diffusion 
term. 

Regularization and LLE. We note that while the LLE framework of mini- 
mizing reconstruction error can yield ill-behaved solutions, practical implemen- 
tations add a regularization term when constructing the weights. This causes 
the reconstruction error to be non-zero in general and gives unique solutions 
for the weights which favor equal weights (and asymptotic behavior like kNN 
graphs). 



16 



0.15- 
0.1- 
0.05- 



(A) Gaussian Manifold 



(B) Kernel Laplacian embedding 



(C) Raw kNN Laplacian Embedding 



0.06 
0.04 
0.02 


-0.02 
-0.04 
-0.06 



-0.06 -0.04 -0.02 0.02 0.04 0.06 
(D) rescaled kNN Laplacian Embedding 



0.06 

0.04 
0.02 


-0.02 
-0.04 
-0.06 



-0.06 -0.04 -0.02 



0.02 0.04 0.06 



Figure 1: (A) shows a 2D manifold where the x and y coordinates are drawn from 
a truncated standard normal distribution. (B-D) show embeddings using differ- 
ent graph constructions. (B) uses a normalized Gaussian kernel ^i^^y/'2^^yy/2 , 

(C) uses a kNN graph, and (D) uses a kNN graph with edge weights \/p{x)p{y). 
The bandwidth for (B) was chosen to be the median standard deviation from 
taking 1 step in the kNN graph. 



4 Experiments 

To illustrate the theory, we show how to correct the bad behavior of the kNN 
Laplacian for a synthetic data set. We also show how our analysis can predict 
the surprising behavior of LLE. 

kNN Laplacian. We consider a non-linear embedding example which almost 
all non-linear embedding techniques handle well but the kNN graph Laplacian 
performs poorly. Figure [T] shows a 2D manifold embedded in 3 dimensions and 
embeddings using different graph constructions. The theoretical limit of the 
normalized Laplacian Lknn for a kNN graph is Lknn = p^i- while the limit for 
a graph with Gaussian weights is L gauss = Ap. The first 2 coordinates of each 
point are from a truncated standard normal distribution, so the density at the 
boundary is small and the effect of the l/p term is substantial. This yields the 
bad behavior shown in Figure [1] (C). We may use the relationship between the 
fc*''-nearest neighbor and the density in Eqn (|15p to obtain a pilot estimate p of 
the density. Choosing Wx{y) — \/ Pn{x)pn{y) , gives a weighted kNN graph with 
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(A) Toroidal helix (B) Laplacian 




-2 -1.5 -1 -0.5 0.5 1 -1 1 2 

Figure 2: (A) shows a ID manifold isometric to a circle. (B-D) show the em- 
beddings using (B) Laplacian eigenmaps which correctly identifies the structure, 
(C) LLE with regularization le-3, and (D) LLE with regularization le-6. 



the same limit as the graph with Gaussian weights. Figure [T] (D) shows that 
this change yields the roughly desired behavior but with fewer "holes" in low 
density regions and more in high density regions. 

LLE. We consider another synthetic data set, the toroidal helix, in which the 
manifold structure is easy to recover. Figure [2] (A) shows the manifold which is 
clearly isometric to a circle, a fact picked up by the kNN Laplacian in Figure [5] 
(B). 

Our theory predicts that the heuristic argument that LLE behaves like the 
Laplace-Beltrami operator will not hold. Since the total dimension for the drift 
and diffusion terms is 2 and the global coordinates also have dimension 2, that 
there is forced cancellation of the first and second order differential terms and the 
operator should behave like the operator or include higher order differentials. 
In Figure [2] (C) and (D), we see this that LLE performs poorly and that the 
behavior comes closer to the operator when the regularization term is smaller. 
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5 Remarks and Discussion 



5.1 Non-shrinking neighborhoods 

In this paper, we have presented convergence results using resuhs for diffu- 
sion processes without jumps. Graphs constructed using a fixed, non-shrinking 
bandwidth do not fit within this framework, but approximat i on th eorems for 
diffusion processes with jumps still apply (see lJacod fc Siriaev (|2003M Instead 



of being characterized by the drift and diffusion pair ii{x), a{x)a{x)'^ , the in- 
finitesimal generators for a diffusion process with jumps is characterized by 
the "Levy-Khintchine" triplet consisting of the drift, diffusion, and "Levy mea- 
sure." Given a sequence of transition kernels K„, the additional requirement 
for convergence of the limiting process is the existence of a limiting transition 
kernel K such that / Kn{-,dy)g{y)dy J K{-,dy)g{y)dy locally uniformly for 
all functions g. This establishes an impossibility result, that no method that 
only assigns positive mass on shrinking neighborhoods can have the same graph 
Laplacian limit as a a kernel construction method where the bandwidth is fixed. 

5.2 Convergence rates 

We note that one missing element in our analysis is the derivation of convergence 
rates. For the main theorem, we note that it is, in fact, not necessary to apply 
a diffusion approximation theorem. Since our theorem still uses a kernel (albeit 
one with much weaker conditions), a virtually identical proof can be obtained 
by applying a functio n f and Taylor ex panding it. Thus, we believe that similar 
convergence rates to iHein et al.i (|2007h can be obtained. Also, while oui con- 



vergence result is stated for the strong operator topology, the same conditions 
as in Hein give weak convergence. 

5.3 Relation to density estimation 

The connection between kernel density estimation and graph Laplacians is obvi- 
ous, namely, any kernel density estimation method using a non- negative kernel 
induces a random walk graph Laplacian and vice versa. 

In this paper, we have shown that as a consequence of identifying the asymp- 
totic degree term, we have shown consistency of a wide class of adaptive kernel 
density estimates on a manifold. We also have shown that on compact sets, the 
the bias term is uniformly bounded by a term of order , and a small modifi- 
cation to the Bernstein bound (Eqn[TO|) gives that the variance is bounded by a 
term of order Both of which one would exp e ct. Th is ge neralize s previ ous 

work on manifold density estimation by IPelletieij (|2005l ) and lOzakinI (|2009l ) to 
adaptive kernel density estimation. 

The well-studied field of kernel density estimation may also lead to insights 
on how to choose a good location dependent bandwidth as well. We compare 
the form of our density estimates to other well-known adaptive kernel density 
estimation techniques. The balloon estimator and sample smoothing estimators 
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as described bv lTerrell k. Scott ( 1992 ) are respectively given by 



f2{x) 



nh[xY ^ 



K 



\\Xi 



K 



h{xi) 

\\Xi ~ X 

h{xi) 



(16) 



(17) 



In the univariate case, iTerrell fc ScotH (|l992h show that the balloon esti- 
mators yield no improvement to the asymptotic rate of convergence over fixed 
bandwidth density estimates. The sample smoothing estimator gives a density 
estimate which does not necessarily integrate to 1. However, it can exhibit 
better as ymptotic behavio r in some cases. The Abramson square root law es- 
timator (|AbramsonL Il982l ) is an example of a sample smoothing estimator and 
takes h{xi) = hp{xi)~^^^ . On co mpact intervals, this estimator has bias of 
order h'^ rather than the usual ( Silverman . 1998f) . and it achieves this bias 
reduction without resorting to higher order kernels, which necessarily negative 
in some region. H owever, the bias in th e tail for univariate Gaussian data is of 
order (/i/log/i)^ (|Terrell fc Scottl . 119921 ). which is only marginally better than 

While we do not make claims of being able to reduce bias in the case of den- 
sity estimation a manifold, in fact, we do not believe bias reduction to the order 
of is possible unless one makes some use of manifold curvature information, 
the existing density estimation literature suggests what potential benefits one 
may achieve over different regions of a density. 



5.4 Eigenvalues/Eigenvectors 

Fixed bandwidth case We find our location depe ndent bandwidth results to 
be of interest in the context of the negative result in Ivon Luxburg et al.l (|2008l ) 
for unnormalized Laplacians with a fixed bandwidth. Their results state that 
for unnormalized graph Laplacians, the eigenvectors of the discrete approxima- 
tions do not converge if the corresponding eigenvalues lie in the range of the 
asymptotic degree operator d{x), whereas for the normalized Laplacian, the "de- 
gree operator" is the identity and the eigenvectors converge if the corresponding 
eigenvalues stay away from 1. Our results suggest that even with unnormalized 
Laplacians, one can obtain convergence of the eigenvectors by manipulating the 
range of the degree operator through the use of a location dependent bandwidth 
function. For example, with kNN graphs we have that the degree operator is 
essentially 1. For self-tuning graphs, the degree operator also converges to 1, 
and since the kernels form an equicontinuous family of functions, the theory 
for compact integral operators may be rigorously applied when the bandwidth 
scaling is fixed. 

Thus we can obtain unnormalized and normalized graph Laplacians that 
(1) have spectra that converges for fixed (non-decreasing) bandwidth scalings 
and (2) converge to a limit that is different from that of previously analyzed 
normalized Laplacians when the bandwidth decreases to 0. 
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Corollary 6. Assume the standard assumptions. Further assume that for some 
ho > 0, ^Kq (J^^^j^^ '■ h > /io| form an equicontinuous family of functions. Let 
q,g € C^(A^) be bounded away from and oo. Set 




rAy) = VlWf^ (18) 



Wxiy) = vM^^My)- (19) 

If — hi for all n, then the eigenvectors of t he normalized Laplacians converge 
in the sense given in von Luxbura et al\ l200^ ). Ifh^ \. satisfy the assumptions 



of theorem\^ then the limit rescaled degree operator is d ^ g and 



which induces the smoothness functional 



c„L„„™/^.g-i/2^A,(g-i/2/) (20) 



(f,g-^/'^-A,{g-^/'f)) =(v(.9-^/V),V(5-VV))^ ^ , 



(21) 



Proof. Assume the hn i case. Use corollary [5] and solve for uj and 7 in 
the system of equations: q — i P^a;7"'+^, g = poj^" ^ . In the /i„ — hi case, the 
conditions satisfy those given in von Luxburg et all ( 2008 ) with the modification 



that the kernel is not bounded away from and the additional assumption that 
p is bounded away from 0. T hus, the asymptotic degre e operator d is bounded 



away from 0, and the proofs in lvon Luxburg et alj (|2008[ ) remain unchanged. □ 



We note that the restriction to an equicontinuous family of kernel functions 
excludes kNN graph constructions. However, one may get around this by con- 
sidering the two-step transition kernels K2(x,y) = K{x,-) * K{-,y), where * 
denotes the convolution operator with respect to the underlying density. For in- 
dicator kernels like those used in kNN graph constructions, K2 will be Lipschitz 
and hence form an equicontinuous family. Thus, if one handles the potential 
issues with the random bandwidth function, one may apply the theory of com- 
pact integral operators to obtain convergence of the spectrum and eigenvectors 
for kNN graph Laplacians when k grows appropriately. 



5.5 Reasons for choosing a graph construction method 

We highlight how our more general kernel can yield advantageous properties. In 
particular, it yields graphs constructions where one can (1) control the sparsity 
of the Laplacian matrix, (2) control connectivity properties in low density re- 
gions, (3) give asymptotic limits that cannot be attained using previous graph 
construction methods, and (4) give Laplacians with good spectral properties in 
the non-shrinking bandwidth case. 
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One way to control (1) and (2) is to make the binary choice of using kNN or 
a kernel with uniform bandwidth to construct the graph. Our results show that, 
by using a pilot estimate of the density, one can obtain sparsity and connectivity 
properties in the continuum between these two choices. 

For (3) and (4), we note that the limits for previously analyzed unnormal- 
ized Laplacians were of the form p°'~^Apaf. Using corollary [SJ one see that 
limits of the form ^Ag for any smooth, bounded density q on the manifold 



can be obtained. Equivalently, one can approximate the smoothness functional 
ll'^/llL2(q) for any almost any q, not just p". 

For normalized Laplacians, which have good spectral properties, the previ- 
ously known limits induced smoothness functionals of the form || ^Cp*"""^""^^^/) || ^^^(pa) ■ 
With our more general kernel and any g,q (z C^{A4), we may induce a smooth- 
ness functional of the form ||V(sr/)||^^^^^. In particular, in the interesting case 
where 5 = 1 and the smoothness functional is just a norm on the gradient of /, 

|2 

i2(g) 



i.e. II V/ll^ / % , q may be chosen to be almost any density, not just q = 



6 Conclusions 

We have introduced a general framework that enables us to analyze a wide 
class of graph Laplacian constructions. Our framework reduces the problem of 
graph Laplacian analysis to the calculation of a mean and variance (or drift 
and diffusion) for any graph construction method with positive weights and 
shrinking neighborhoods. Our main theorem extends existing strong operator 
convergence results to non-smooth kernels, and introduces a general location- 
dependent bandwidth function. The analysis of a location-dependent bandwidth 
function, in particular, significantly extends the family of graph constructions for 
which an asymptotic limit is known. This family includes the previously unstud- 
ied (but commonly used) kNN graph constructions, unweighted r-neighborhood 
graphs, and "self-tuning" graphs. 

Our results also have practical significance in graph constructions as they 
suggest graph constructions that (1) can produce sparser graphs than those 
constructed with the usual kernel methods, despite having the same asymptotic 
limit, and (2) in the fixed bandwidth regime, produce normalized Laplacians 
that have well-behaved spectra but converge to a different class of limit opera- 
tors than previously studied normalized Laplacians. In particular, this class of 
limits include those that induce the smoothness functional || V/H^^^^^ for almost 
any density q. The graph constructions may also (3) have better connectivity 
properties in low-density regions. 
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8 Appendix 
8.1 Main lemma 

Lemma 7 (Integration with location dependent bandwidth) . Let 1 be the indi- 
cator function and h > be a constant. Let rx be a location dependent bandwidth 
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function that satisfies the standard assumptions, i.e. it has a Taylor-like expan- 
sion 

fx{y) = rx{x) + {r^ix) + Q!j;Sign(u!f s)wa;)^s + er{x, s). 

m/2 

Let Vm — -=f^ — -T- be the volume of the unit m-sphere. 
Then 

Ml = -i— / si f < /i) - /i2r,(a:)'"+2f(:r) + hh,{x, h) 

M2 = ^— [ ss'^l ( < h] ds = -^r,{xr+^I + h^e2{x, h) 
Vmh^ J \ r4s) ) m + 2 ^ ' ^ 



where sup^g^ h<hn W^ii^i < /'^'^ some constant Cc > 0. 

Proof. Let v{s) — f{x) + sign{s'^Ux)aUx. We will show that the set on which 
the indicator function is approximately a sphere shifted by v/r^ix) with radius 
hr^{x). 

<h^= l[\\sf + \\L{ss^)f < h\r,{x) + v{sfs + 0(||.f ))^) 
(\\sf < h^r^ix)^! + 2«(sf s + Oih^))) 



= 1 
= 1 

= 1 



\sf -2h- 
v{s) 



.v(s)'^s h*v{s)'^v{s) 



Tx {xY 



< h^r^{xf + 0{h'^) 



< hr^{x) + h^5.^{s) 



for some function 5x{s). Furthermore, the assumptions on the bounded curva- 
ture of the manifold and uniform bounds on the bandwidth function remainder 
term er(a;, s) give that the perturbation term 5x{s) may be uniformly bounded 
by supjjg^ < C'idlsll ) for some constant Cs. 

The result for the zeroth moment follows immediately from this. The results 
for the first and second moments we calculate in lemma [TUl □ 

8.1.1 Refined analysis of tiie zeroth moment 

For convergence of the normalized Laplacian, we need a more refined result for 
the zeroth moment. 



Lemma 8. At 



fx{y) = r^is) +er{x,s). 
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where (s) is twice continuously differentiable as a function of x and s and and 
Cr is bounded. Then 



I 



Vmh' 



1 



m 



1 



( 




< h] ds 



r^ix)"" + h^b{x) + h^eo{x,h) 



where b is continuous and sup^. |eo(2;, — as h ^ 0. 

Proof. We first sketch idea behind the proof and ieave the detaiis to interested 
readers. One may convert the integrai in normai coordinates to an integral in 
polar coordinates {R,6). One may then apply the implicit function theorem to 
obtain that the unperturbed radius function i? is a twice continuously differen- 
tiable function of h. This gives a Taylor expansion of the zcroth moment with 
respect to h. er{x, s) gives the desired result. 

We may express the integral for the zeroth moment in polar coordinates 



^-e^) = / vPI^l [tJIT <h)ds ^ jRx{8,h)dfig where fig is the uniform 
measure on the surface of the unit m-sphere and s — s/h — Rx{0,h))9 solves 
the equation 



and 'Hr^(o) is the Hessian of rx{-) evaluated at 0. 

By the implicit function theorem, the solutions s define a twice continuously 
differentiable function of x^h. For sufficiently small /i > 0, s is bounded away 
from since r^ is bounded away from and ||s//i|| is bounded away from oo 
by the bound in lemma [T] Thus, Rx{9,h) and Zx{h) are twice continuously 
differentiable with bounded second derivatives. 

Zx{h) then has a second-order Taylor expansion Zx{h) — Zx{0) + Z'^{0)h + 



By the less refined analysis in lemma [71 we have that Zx{0) = r^ix)™ and 
Z'^{0'^) = 0. One may apply a squeeze theorem to obtain that the contribution 
of the error term er{x, s) to the zeroth moment is bounded by Cr sup^ ^ |er(a;, s)| 
for some constant Cr, and the result follows. □ 

8.2 Moments of the indicator kernel / Integrating over 
the centered sphere in normal coordinates 

Here we calculate the first three moments of the normalized indicator kernel 
where Vm — J < l)du — Jg du is the volume of the m-dimensional unit 

sphere in Euclidean space. 

Lemma 9 (Moments for the sphere). Let K{\\s\ \ /h) = ^^^y l(||s|| < h). Then 



||sf + L{~s~s^) = {rx{x) + h\/rx{xf~s + h'^f^'Hr^(o)sf . 



(0)/l2 + o(/l2). 
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the first two moments are given by: 



Mo= K{\\s\\/h)ds^—— ds = l + 0(h 



n3\ 



it J s„ 

sKi\\s\\/h)ds=j^J^ sds = + Oih'') 

M. = / ..-A^IHI /h)ds = ^ j^^ ss-ds = ^1 + Oih^). 

Proof. The error terms 0{h'^) arise trivially after converting normal coordinates 
to tangent space coordinates. Thus, we may simply treat the integrals as inte- 
grals in m-dimensional Euclidean space to obtain the leading term. The values 
for Mo and Mi follow immediately from the definition of the volume Vm and 
by symmetry of the sphere. We obtain the second moment result by calculating 
the values on the diagonal and off-diagonal. On the off-diagonal 

/ SiSjds = 

Vm JSrn, 

for i ^ j due to symmetry of the sphere. 
On the diagonal 

1 ^ 2^,^%^ /\^2(l_,2)(™-l)/2^,^ (22) 

xs,(l-s2)(™-i)/2ds, (23) 



Vm-1 ''^ 



Vr 



m J — 1 



1 Vrn-l 

ni+l VmVrn+l J- 

1 Vm-1 Vm+2 

m+l Vm+1 Vm 

1 



Kn+l(l-s2)("+l)/2ds, (25) 

(26) 

(27) 

III -f ^ 

where the last equality uses the recurrence relationship Vm+2 = T^^Vn- CH 



8.3 Integrating the shifted and peturbed sphere 

Here we calculate the moments used in Lemma [71 

The integrals in lemma [7] essentially involve integrating over sphere with 
(1) a shifted center h'^rxix), (2) a symmetric shift by sign(s-^M)/i^aa;U on two 
half-spheres, and (3) a small perturbation h'^Sx{s). 

Lemma 10 (Moments of the shifted and perturbed sphere). Let Vc G K™, u be a 

unit vector inM™ , /3 ^ M., and h > 0. Define K{s) — l[\\s ~ Vc + sign-{s'^u)l3u\\ < 
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h + h^S), so that the support of K is a shifted and perturbed sphere with center 
Vc, symmetric shift sigii{s'^ u) (3u , and radius perturbation h^S. 

Assume \\vc\\ , < C/i^ and 6 < min{C, 1} for some constant C, and put 
hmax = h + h^6 

Then 

Mo = / K{s)ds = h"' + eo 
Ml = / sK{s)ds = /i^+^c + ei 

M2 = — / Ss'^K{s)ds = — -1 + 62. 

Vm Jr™ To + 2 

where ei < kCK!^'^^ and Cj < kCHI^^^ for i = 1,2 and k is some universal 
constant that does not depend on 6,Vc, or (3. 

Proof Set F+ = {s e : u^s > 0} and H_ = to be the half-spaces 

defined by u. For a set C R™, let H + Vc := {w + Vc : w e H}. 

We first bound the error introduced by the perturbation h^6. Define 

A := supp{k) = {s e : ||s - Wc + sign(s'^u)/3u|| <h + h^5} 
^ := {s e R"* : ||s -Vc + sign(s^u)/3w|| < h} 



so that A gets rid of the dependence on the perturbation. 
For any function Q, we have a trivial bound 



/ Q{s)ds- (_Q{s)ds 
J A J A 



< Q max 

\Vol{A)-Vol{iA))\ 

^ max ^ m\i •'max " I 

< QmaxVm{mhZ-^){hH) 

= 0{h^+^Qmax) (28) 

where Qmax = ^^P\\s\\<h^a.:c Qi^) ^^'^ 'mVm-i is the surface area of the m- 
dimensional sphere. For Q{s) = 1/Vm, s/Kn, or ss^/Kn, the corresponding 
Qmax are l/V^, /imax/Ki, and h^rnax/Vm- The error induced by the perturba- 
tion is thus of the right order. 

We now consider the integral ovcir the unperturbed but shiftcid sphere. De- 
note by Bh{v) the ball of radius h centered on v. Note that the function 
l(s £ A) = l{\\s — Vc + sign{s'^u)(3u\\ < h) is symmetric around Vc- Thus, 
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for a functfon Q{s — Vc + (3u) which is symmetric around Vc, 

Q{s — Vc)ds = 2 / Q{s — Vc)ds 
JAnH+ 

Q(.s-t;c)l(||s-t;c|| < h)ds- 

H+ 

Q{s - Vc)m\\s - Well <h)- l{\\s -Vc + f3u\\ < h))ds 
= I Q(s)l(||s|| < h)ds- 

2 / Q(s - Wc)(l(s e Bhivc)) - l(s e Bh{vc - Pu)))ds. 



For Q{s) = l/Kn or ss /Vm, lemma IHl gives that the value of the main term 

m+2 - 



/ Q(s)l(||s|| < h)ds is /i™ or respectively. The error term is bounded by 



Q(s - Wc)(l(s G Bh{vc)) - l{s e Bh{vc - I3u)))ds 

H+ 

< 2Q„rax I |l(s e Bhivc)) ~ l{s G Bh{vc ~ l3u))\ds 

< 2Q 

771 ax \f3\Area{H+ n Bh{vc)) 

< 2Q„^ax|/3|(mK^-l/^™-l) 

< 2mVm-lCQ^acch"'+^ 

where Area{H^ riBii{vc)) is the surface area of a half-sphere of radius h. Plug- 
ging in Qmax — and h'^/Vm give that the error terms for the zeroth and 
second moment calculations are of the right order. 

By another symmetry argument, we have for the first moment calculation 
J-j^{s — Vc)ds = or equivalently, 

L'^ds = ^ _ds 

Vm J A ^rn J A 

where the last equality holds from the calculation of the zeroth moment above. 
More precisely, the error term is bounded by 2mVjn-iCQmaxh"^^^Vc- 

□ 
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(B) Diffusion maps embedding 




